Fix169 by janvanrijn · Pull Request #241 · openml/openml-python

janvanrijn · 2017-04-27T14:55:23Z

No description provided.

# Conflicts: # openml/runs/functions.py

reimplemented extract parameters from run (based on sklearn converter)

Development into fix169

Develop into fix169 (II)

codecov-io · 2017-05-02T13:46:11Z

Codecov Report

Merging #241 into develop will increase coverage by 0.46%.
The diff coverage is 94.18%.

@@             Coverage Diff             @@
##           develop     #241      +/-   ##
===========================================
+ Coverage    88.36%   88.82%   +0.46%     
===========================================
  Files           23       24       +1     
  Lines         1960     2032      +72     
===========================================
+ Hits          1732     1805      +73     
+ Misses         228      227       -1

Impacted Files	Coverage Δ
openml/runs/run.py	`89.85% <100%> (+3.04%)`	⬆️
openml/setups/setup.py	`100% <100%> (ø)`
openml/runs/__init__.py	`100% <100%> (ø)`	⬆️
openml/setups/__init__.py	`100% <100%> (ø)`	⬆️
openml/setups/functions.py	`98.46% <100%> (-1.54%)`	⬇️
openml/testing.py	`98.11% <100%> (+0.03%)`	⬆️
openml/runs/functions.py	`83% <78.26%> (+0.08%)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 65e8758...85472cc. Read the comment docs.

mfeurer

Sorry, I only got through half of this PR yet. Will continue tomorrow.

mfeurer · 2017-04-28T10:50:35Z

+    for param_name in sorted(model_params):
+        if 'random_state' in param_name:
+            currentValue = model_params[param_name]
+            # important to draw the value at this point (and not in the if statement)


Hm, could you explain why? It's not clear to me from this.

Added description.

I'm not sure if we really need this, but seems nice property to respect.

mfeurer · 2017-04-28T10:51:08Z

        return False

+def _get_seeded_model(model, seed=None):
+    '''Sets all the non-seeded components of a model with a seed.


You should mention the restriction that one cannot use a random state in the pipelines.

I'm not sure. One could argue that that is a restriction of the run_tasks function, as that is the function the user interacts with. Furthermore, that function is responsible for the check.

This function only adds seeds to unseeded models

But it can raise an exception:

import numpy as np import sklearn.ensemble import openml rf = sklearn.ensemble.RandomForestClassifier( random_state=np.random.RandomState(1)) openml.runs.functions._get_seeded_model(rf, 5)

But you're right, it should be documented in the run_tasks() function.

Mea culpa, i will add it

mfeurer · 2017-04-28T10:51:45Z


    return run

+def initialize_model_from_run(run_id):


This is neither used nor tested.

Agreed, I added tests.

mfeurer

I'm mostly through, I olny need to understand why test_existing_setup_exists has to use a different classifier.

mfeurer · 2017-05-03T12:34:30Z

+
+       Parameters
+       ----------
+       flow_id : int


It looks like you copied the docstring from the object above and didn't adapt it.

I loove copy/pasting. fixed it.

mfeurer · 2017-05-03T12:41:39Z

+                    _current['oml:component'] = main_id
                else:
-                    raise ValueError("parameter %s not in flow description of flow %s" %(param,flow.name))
+                    _current['oml:component'] = _param_dict[_flow.name]


Why is it once an ID, and once a name?

mfeurer · 2017-05-03T12:51:33Z


    @staticmethod
-    def _parse_parameters(model, flow):
+    def _parse_parameters(model, server_flow):


Looking at this again I'm actually surprised that this is not called run_task, but only when publishing. But maybe this should be its own issue/PR.

Can you elaborate a bit? I don really understand which / why

Sorry, I meant "that this is not called IN run_task"

mfeurer · 2017-05-03T12:53:17Z

+            current = openml.setups.get_setup(setups[idx])
+            assert current.flow_id > 0
+            if num_params[idx] == 0:
+                assert current.parameters is None


Could you please use self.assert in the unit tests? It gives nicer outputs.

You mean self.asserts()?
Sure.

janvanrijn · 2017-05-03T14:49:28Z

test_existing_setup_exists makes use of sentinels. That gives us a general problem when serializing a flow (or setup) and comparing it to one on the server, as the flow (setup) serialization is not aware of the (and which) sentinel string was used. For that reason, I slightly generalized one of the functions, such that it does not rely on name mappings for the main flow (boolean flag that indicates a function call on depth 1). This way, we do not have to remove the sentinels and we can in this context still test flows without subflows.

mfeurer · 2017-05-03T15:01:10Z

            assert current.flow_id > 0
            if num_params[idx] == 0:
-                self.asserts(current.parameters is None)
+                self.assertTrue(current.parameters is None)


Sorry, I should have been more specific. The correct one here would be self.assertIsNone.

mfeurer · 2017-05-03T15:01:23Z

+                self.assertTrue(current.parameters is None)
            else:
-                self.asserts(len(current.parameters) == num_params[idx])
+                self.assertTrue(len(current.parameters) == num_params[idx])


Sorry, I should have been more specific. The correct one here would be self.assertEqual.

mfeurer · 2017-05-03T15:06:05Z

Thanks for the explanation of the sentinels. Maybe it would be good to add that to the actual unit test. What I'm wondering right now is if this is save with respect to running the unit tests in parallel (which happens on travis-ci). You assume:

# although the flow exists, we can be sure there are no
# setups (yet) as it hasn't been ran

it could happen that it was already run. Maybe adding a sentinel to a hyperparameter? Or is there some other way of making a setup unique?

Once we figured out this part, I think we can merge this PR before it becomes too big and create new PRs for the missing functionality.

janvanrijn · 2017-05-04T14:24:03Z

I had already added this in the comment:

# because of the sentinel, we can not use flows that contain subflows

Also, the other comment was slightly confusing. I changed this to:

# although the flow exists (created as of previous statement),
# we can be sure there are no setups (yet) as it was just created
# and hasn't been ran

Should be fine now? Let's merge

janvanrijn added 14 commits April 25, 2017 15:52

seed functionality

10940a2

implemented setup data structure

982898e

parsing parameter values into setup

8ceb290

update run with new parameter extraction procedure

ae85e5b

Merge branch 'develop' into fix169

7f59df2

# Conflicts: # openml/runs/functions.py

fixed merge conflict bug,

83662d7

reimplemented extract parameters from run (based on sklearn converter)

functionality to reconstruct a flow using a given set of parameters

71b5efd

clarifications to _to_dict_of_dicts function

af78436

almost finished reinstatiating setups

e37bb93

implemented some sort of unit test. should be improved

08e2ae8

Merge pull request #237 from openml/develop

280a515

Development into fix169

Merge pull request #240 from openml/develop

6fdba65

Develop into fix169 (II)

finalized instantiate setup check

b8ced46

fix unit tests for setup

363b381

mfeurer requested changes May 2, 2017

View reviewed changes

janvanrijn added 2 commits May 2, 2017 18:26

requests from @mfeurer

3ae5f8e

added comment

70475fe

mfeurer requested changes May 3, 2017

View reviewed changes

janvanrijn added 2 commits May 3, 2017 16:34

changed comments of setup, changed assertions

23aaf81

typo

bd8ee24

mfeurer requested changes May 3, 2017

View reviewed changes

changes requested

85472cc

mfeurer approved these changes May 4, 2017

View reviewed changes

mfeurer merged commit 89a3dd4 into develop May 4, 2017

mfeurer deleted the fix169 branch May 4, 2017 14:41

mfeurer mentioned this pull request May 5, 2017

Set seed for run #169

Closed

Uh oh!

Conversation

janvanrijn commented Apr 27, 2017

Uh oh!

codecov-io commented May 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mfeurer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mfeurer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

janvanrijn commented May 3, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mfeurer commented May 3, 2017

Uh oh!

janvanrijn commented May 4, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-io commented May 2, 2017 •

edited

Loading